plotly: make dynamic and interactive graphs¶
plotly is a graphing library with language-specific versions for Python, R, and JavaScript.
It's a powerful tool, but it can be tough googling the documentation since you'll need to ensure that the page you've found is relevant to the language you're using.
In this example we'll be plotting some sankey diagrams from Philly's proposed 2019 operating budget.
Import libraries¶
Start by importing the plotly library and the graph_objects submodule:
In [1]:
import plotly
import plotly.graph_objects as go
We'll also need pandas and the HTML class from iPython to display the graph:
In [2]:
import pandas as pd
from IPython.display import HTML
Define functions¶
These functions will handle the following processes:
- filtering the raw dataframe down to a single department
- generating a flow graph (aka "sankey diagram") of a single department's budget source and planned spending
- rendering the graph in HTML that appears in the documentation webpage
In [3]:
def get_sankey_from_dataframe(
df: pd.DataFrame,
source_col: str,
dest_col: str,
value_col: str,
title: str,
) -> go.Figure:
"""
Parse a dataframe into flows, and return a sankey diagram showing results.
Args:
df (pd.DataFrame): input dataframe with source, destination, and value columns
source_col (str): name of the column with the source data
dest_col (str): name of the column with the destination data
value_col (str): name of the column with the flow value data
title (str): name to put at the top of the sankey diagram
Returns:
go.Figure: plotly Sankey diagram from input arguments
"""
# Create lists to hold extracted data
source = []
target = []
value = []
# Get a list of all "nodes" for the diagram
labels_source = list(set(df[source_col]))
labels_target = list(set(df[dest_col]))
all_labels = labels_source + labels_target
# Iterate over the dataframe rows
for idx, row in df.iterrows():
# Translate src/dest values to integer index value
source_val = all_labels.index(row[source_col])
dest_val = all_labels.index(row[dest_col])
total_val = row[value_col]
# Add this row's flow to the data lists
source.append(source_val)
target.append(dest_val)
value.append(total_val)
# Generate the Sankey figure from extracted data
data = go.Sankey(
link = {"source": source, "target": target, "value": value, "color": "#bbbbbb"},
node = {"label": all_labels, "pad": 20, "thickness": 10}
)
layout = go.Layout(title=go.layout.Title(text=title))
return go.Figure(data=data, layout=layout)
def render_figure(fig: go.Figure) -> HTML:
"""
Transform a plotly figure into HTML representation that
shows up in the version of this notebook that appears in the
documentation website.
Args:
fig (go.Figure): a plotly figure
Returns:
HTML: html representation of the figure
"""
return HTML(fig.to_html())
def visualize_department(df: pd.DataFrame, department: str) -> HTML:
"""
Filter the raw dataframe down to a single department,
then make and render the figure.
Args:
df (pd.DataFrame): raw operating budget dataframe
department (str): name of the department to visualize
Returns:
HTML: html representation of the figure
"""
filtered_df = df[df["department"] == department]
fig = get_sankey_from_dataframe(filtered_df, "fund", "class", "total", f"{department} - Operating Budget")
return render_figure(fig)
Generate graphs¶
We'll now use these functions to generate a sankey diagram for a variety of city departments.
First we need to load the raw data into pandas:
In [4]:
url = "https://phl.carto.com/api/v2/sql?q=select+*+from+%20operating_budget_fy_2019_proposed&format=csv&filename=operating_budget_fy_2019_proposed&skipfields=cartodb_id,the_geom,the_geom_webmercator"
df = pd.read_csv(url)
Streets¶
In [5]:
visualize_department(df, "Streets")
Out[5]:
Police¶
In [6]:
visualize_department(df, "Police")
Out[6]:
Managing Director's Office¶
In [7]:
visualize_department(df, "Managing Director's Office")
Out[7]:
Water¶
In [8]:
visualize_department(df, "Water")
Out[8]: